Exploiting Multi-Label Information for Noise Resilient Feature Selection
نویسندگان
چکیده
In conventional supervised learning paradigm, each data instance is associated with one single class label. Multi-label learning differs in the way that data instances may belong to multiple concepts simultaneously, which naturally appear in a variety of high impact domains, ranging from bioinformatics, information retrieval to multimedia analysis. It targets to leverage the multiple label information of data instances to build a predictive learning model which can classify unlabeled instances into one or multiple predefined target classes. In multi-label learning, even though each instance is associated with a rich set of class labels, the label information could be noisy and incomplete as the labeling process is both time consuming and labor expensive, leading potential missing annotations or even erroneous annotations. The existence of noisy and missing labels could negatively affect the performance of underlying learning algorithms. More often than not, multi-labeled data often has noisy, irrelevant and redundant features of high dimensionality. The existence of these uninformative features may also deteriorate the predictive power of the learning model due to the curse of dimensionality. Feature selection, as an effective dimensionality reduction technique, has shown to be powerful in preparing high-dimensional data for numerous data mining and machine learning tasks. However, a vast majority of existing multi-label feature selection algorithms either boil down to solving multiple single-labeled feature selection problems or directly make use of the imperfect labels to guide the selection of representative features. As a result, they may not be able to obtain discriminative features shared across multiple labels. In this paper, to bridge the gap between rich source of multi-label information and its blemish in practical usage, we propose a novel noise resilient multi-label informed feature selection framework MIFS by exploiting the correlations among different labels. In particular, to reduce the negative effects of imperfect label information in obtaining label correlations, we decompose the multi-label information of data instances into a low-dimensional space and then employ the reduced label representation to guide the feature selection phase via a joint sparse regression framework. Empirical studies on both synthetic and real-world datasets demonstrate the effectiveness and efficiency of the proposed MIFS framework.
منابع مشابه
MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection
Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...
متن کاملExploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملMutual Information-based multi-label feature selection using interaction information
Multi-label feature selection is regarded as one of the most promising techniques that can be used to maximize the efficacy and efficiency of multi-label classification. However, because multi-label feature selection algorithms must consider multiple labels concurrently, the task is more difficult than singlelabel feature selection tasks. In this paper, we propose the Mutual Information-based m...
متن کاملFeature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine
Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods. In filter methods, features subsets are selected due to some measu...
متن کاملMulti-Label Informed Feature Selection
Multi-label learning has been extensively studied in the area of bioinformatics, information retrieval, multimedia annotation, etc. In multi-label learning, each instance is associated with multiple interdependent class labels, the label information can be noisy and incomplete. In addition, multi-labeled data often has high-dimensional noisy, irrelevant and redundant features. As an effective d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017